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Abstract — Rewriting is a common approach to logic optimiza- 
tion based on local transformations. Most commercially available 
logic synthesis tools include a rewriting engine that may be used 
multiple times on the same netlist during optimization. This paper 
presents an And-Inverter graph based rewriting algorithm using 
5-input cuts. The best circuits are pre-computed for a subset 
of NPN classes of 5-variable functions. Cut enumeration and 
Boolean matching are used to identify replacement candidates. 
The presented approach is expected to complement existing 
rewriting approaches which are usually based on 4-input cuts. 
The experimental results show that, by adding the new rewriting 
algorithm to ABC synthesis tool, we can further reduce the area 
of heavily optimized large circuits by 5.57% on average. 

Index Terms — Logic optimization, rewriting, NPN classes, cut 
enumeration, Boolean matching 



I. Introduction 

Logic optimization approaches can be divided into 
algorithmic-based methods, which are based on global trans- 
formations, and rule-based methods, which are based on local 
transformations QJ. Rule-based methods, also called rewriting, 
use a set of rules which are applied when certain patterns are 
found. A rule transforms a pattern for a local sub-expression, 
or a sub-circuit, into another equivalent one. Since rules 
need to be described, and hence the type available of oper- 
ations/gates must be known, the rule-based approach usually 
requires that the description of the logic is confined to a limited 
number of operation/gate types such as AND, OR, XOR, NOT 
etc. In addition, the transformations have limited optimization 
capability since they are local in nature. Examples of rule- 
based systems include LSS and SOCRATES 0. 

Algorithmic methods use global transformations such as 
decomposition or factorization, and therefore they are much 
more powerful compared to the rule-based methods. However, 
general Boolean methods, including don't care optimization, 
do not scale well for large functions. Algebraic methods 
are fast and robust, but they are not complete and thus 
often give lower quality results. For this reasons, industrial 
logic synthesis systems normally use algebraic restructuring 
methods in a combination with rule-based methods. 

In this paper, we propose a new rewriting algorithm based 
on 5-Input cuts. In the algorithm, the best circuits are pre- 
computed for a subset of NPN classes of 5-variable functions. 
Cut enumeration technique |4) is used to find 5-input cuts for 
all nodes, and some of them are replaced with a best circuit. 
The Boolean matcher is used to map a 5-input function 
to its canonical form. The presented approach is expected to 
complement existing rewriting approaches which are usually 
based on 4-input cuts. Our experimental results show that, by 
adding the new rewriting algorithm to ABC synthesis tool [6|, 



we can further reduce the area of heavily optimized large 
circuits by 5.57% on average. 

The paper is organized as follows. Section [TT] describes 
main notions and definitions used in the sequel. Section Hill 
summarises previous work. Section HVl presents the proposed 
approach. Section [V] shows experimental results. Section [VI] 
concludes the paper and discusses open problems. 

II. Background 

A Boolean network is a directed acyclic graph, of which the 
nodes represent logic gates, and the directed edges represent 
connections of the gates. A network is also referred to as a 
circuit. 

A node of the network has zero or more fanins, and zero or 
more fanouts. Afanin of a node n is a node n m such that there 
exists an edge from n m to n. Similarly, a fanout of a node n 
is a node n out such that there is an edge from n to n out . The 
primary inputs (Pis) of a network are the zero-fanin nodes of 
the network. The primary outputs of a network are a subset of 
all nodes. If a network contains flip-flops, the inputs/outputs 
of the flip-flops are treated as POs/PIs of the network. 

An And-Inverter graph (AIG) is a network, of which a node 
is either a PI or a 2-input AND gate, and an edge is negatable. 
An AIG is structurally hashed Q to ensure uniqueness of the 
nodes. The area of an AIG is measured by the number of 
nodes in the network. 

A cut of a node n is a set C of nodes such that any path from 
a PI to n must pass through at least one node in C. Node n 
itself forms a trivial cut. The nodes in C are called the leaves 
of cut C. A cut C is K-feasible if \C\ < K; additionally, C is 
called a K-input cut if \C\ = K. 

The level of a node n is the number of edges of the longest 
path from any PI to n. The depth of a network is the largest 
level among all internal nodes of the network. 

Two Boolean functions, F and G, are 'NPN -equivalent and 
belong to the same NPN equivalence class, if F can be trans- 
formed into G through negation of inputs (N), permutation of 
inputs (P), and negation of the output (N) 0. 

III. Previous Work 

Rewriting of networks was introduced in the early logic 
synthesis systems. SOCRATES and the IBM system 00 
performed rewriting under a set of rewriting rules to replace 
a combination of library gates with another combination of 
gates which had a smaller area or delay. In SOCRATES, 
these rules were managed in an expert system deciding which 
ones to apply and when. The rules in SOCRATES were 
written by human designers, based on personal experience and 
observation of experimental results. 
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In the MIS system (TO), which later developed into SIS IfTTI . 
local transformations such as simplification were used to lo- 
cally optimize a multi-level network after global optimization. 
Two-level minimization methods such as ESPRESSO Q] were 
used to minimize the functions associated with the nodes in 
the network. Similar methods JT2) were also included in works 

of nana ma. 

Rule-based rewriting method was used to simplify AND- 
OR-XOR networks in the multi-level synthesis approach pre- 
sented in Ifl6l . 

AIG-based rewriting technique presented in (17) is used as a 
way to compress circuits before formal verification. Rewriting 
is performed in two steps. In the first step, which happens only 
once when the program starts, all two-level AIG subgraphs are 
pre-computed and stored in a table by their Boolean functions. 
In the second step, the AIG is traversed in topological order. 
The two-level AIG subgraphs of each node are found and 
the functionally equivalent pre-computed subgraphs are tried 
as the implementation of the node, while logic sharing with 
existing nodes is considered. The subgraph leading to least 
number of overall nodes is used as the replacement of the 
original subgraph. 

An improved AIG rewriting technique for pre-mapping 
optimization is presented in lfl8ll . It uses 4-input cuts instead 
of two-level subgraphs in rewriting, and preserves the number 
of logic levels so the area is reduced without increasing 
delay. Additionally, AIG balancing, which minimizes delay 
without increasing area, is used together with rewriting, to 
achieve better results. Iterating these two processes forms 
a new technology-independent optimization flow, which is 
implemented in the sequential logic synthesis and verification 
system, ABC |6j. Experiments show that this implementation 
scales to very large designs and is much faster than SIS IfTTI 
and MVSIS |fl9l , while resulting in circuits with the same or 
better quality. 

IV. AIG Rewriting Using 5-Input Cuts 
The presented algorithm can be divided into two parts: 

1) Best circuit generation 

2) Cut enumeration and replacement 

Part Q] of the algorithm tries to find the optimal circuits for a 
subset of "practical" 5-variable NPN classes, and stores these 
circuits. Part [2] of the algorithm enumerates all 5 -input cuts in 
the target circuit, and chooses to replaces a cut with a suitable 
best circuit. 

In the implementation of rewriting using 4-input cuts 
in ff8l . pre-computed tables of canonical forms and the 
transformations are kept for all 2 16 4-input functions l6l lfT8l . 
As we extend rewriting to 5-input cuts, the size of these tables 
becomes 2 32 . i.e. too large for using in a program that runs on 
a regular computer. In our implementation, we use a Boolean 
matcher (5) to dynamically calculate the canonical form of 
a truth table and the corresponding transformation from the 
original truth table. 

A. Best circuit generation 

Similarly to |fl8l , we pre-compute the candidate circuits 
for each NPN class so they can be directly used later. There 



are 616126 NPN equivalence classes for 5-input functions, 
among which only 2749 classes appear in all IWLS 2005 
benchmarks ll20ll as 5-feasible cuts. We picked 1185 of them 
with more than 20 occurrences, and generated best circuits for 
representative functions of these classes. 

Due to the expanded complexity of the problem, we had to 
make some trade-offs between the quality of the circuits and 
the time and memory usage of our algorithm. Our implemen- 
tation has following differences compared to fl"8l : 

• Use of Boolean matcher to calculate canonical form, 
instead of table look-up. 

• Use of a hash map to store the candidate into best circuits, 
instead of using a full table. 

• When deciding whether to store a node in the node list, a 
node with the same cost as an existing node is discarded, 
instead of being stored in the list. 

• Nodes of both canonical functions and the complement of 
the canonical functions are used as the candidate circuit, 
while in lfl8l complement functions are not used. 

• When the number of nodes reaches an upper limit, a 
reduction procedure is performed before the generation 
continues, leaving only the nodes used in the circuit table. 

We use two structures to store the best circuits: the forest, 
list of all nodes, and the table, storing only the pointers to 
the nodes in the list, which represent canonical functions or 
their complements. In the forest, a node can either be an AND 
node or an XOR node, and two incoming edges of a node have 
complementation attributes. The cost of a node is the number 
of AND nodes plus twice the number of XOR nodes those are 
reachable from this node towards the inputs. 

First, the constant zero node and five nodes for single 
variables are added into the forest. The constant node and one 
of the variable nodes are added to the table, since all variable 
nodes are NPN equivalent. Then, for each pair of nodes in the 
forest, five types of 2-input gates are created, using the pair 
as inputs: 

• AND gate 

• AND gate with first input complemented 

• AND gate with second input complemented 

• AND gate with both inputs complemented 
. XOR gate 

A newly created node is stored in the forest if the following 
conditions are met, otherwise it is discarded: 

• The cost of the node is lower than any other node with 
the same functionality. 

• The cost of the node is lower than or equal to any other 
node with NPN-equivalent functionality. 

In addition, the pointer to this node is added to the table if 
the following condition is also met: 

• The function of the node is the canonical form represen- 
tative, or its complement, in the NPN-equivalence class 
it belongs to. 

When the number of nodes in the forest reaches an upper 
limit, a node reduction procedure is performed, where only 
the reachable nodes from the nodes in the table are left in the 
forest. 
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Algorithm 1 GenerateBestCircuits(P, u, n max ): Generate 
candidate best circuits for a subset of NPN classes of 5-input 
Boolean functions. 

1: Add constant zero node to N and C 

2: Add variable nodes to N 

3: Add node of variable to C 

4: for each from 2 to \N\ do 

5: for each j from 1 to i — 1 do 

6: TryNode(AND, Nu Nj) 

7: TryNode(AND, Not(M), Nj) 

8: TryNode(AND, Nu Not(AT/)) 

9: TryNode(AND, Not(M), Not(iV/)) 
10: TryNode(XOR, N u Nj) 

11: if num. of uncovered practical NPN classes < u then 

12: return 

13: end if 

14: if \N\ > n max then 

15: ReduceNodesO 

16: i <- 1 

17: break 

18: end if 

19: end for 

20: end for 



The algorithm stops when the number of uncovered "prac- 
tical" classes is smaller than a threshold value. 

Finally, the generated best circuits are stored, so they can 
be used later when rewriting takes place. 

The pseudo-code of the proposed best circuit 
generation algorithm is shown in Algorithm Q] The 
GenerateBestCircuits procedure returns a node list 
N and a table of nodes C recording the candidate best circuits 
for a subset of NPN classes. It takes three parameters. 
Parameter P is a set of truth tables of "practical" 5-variable 
functions. This set contains about 1200 5-input canonical 
NPN representatives with 20 or more occurrence in IWLS 
2005 benchmarks. Parameter u is an integer indicating the 
acceptable number of uncovered practical NPN classes; n max 
is an integer indicating the limit number of nodes when a 
node reduction is needed. In our implementation, u is set to 
60, and n max is set to 10000000. 

The pseudo-code for procedure TryNode is shown in Al- 
gorithm [2] TryNode creates a node, and determines whether 
to put it into the node list and the circuit table. Parameter 
T G {AND,XOR} indicates whether the new gate should be 
an AND gate or an XOR gate. Parameter no and n\ are two 
fanins of the new gate. 

Procedure ReduceNodes reduces the node list by removing 
the nodes that are not used in any circuit in the circuit table. 

Procedure Canonicalize calculates the canonical form of 
the truth table of a given function. 

In the algorithms, variables N, C and M are globally 
accessible. N denotes the list of all nodes. C is a hash map 
of the candidate circuits; each of its entry is a set of nodes 
storing the root node of candidate circuits for the NPN class 
of this entry. M is a temporary hash map to store the currently 
minimum costs of all functions. 



Algorithm 2 TryNode(7\ no, n\): Create a node of type T 
with fanins no and n i , and determine whether to put it into N 
or C. 

l: "new <— CreateNode(7\ no, ni) 

2: t <- GetTruth(n new ) 

3: if M, not exist or M, > Cost(n new ) then 

4: M t <— Cost(n new ) 

5: else 

6: return 

7: end if 

8: fcanon Canonicalize (/) 

9: if 3n G C; canon such that Cost(n) < Cost( ft new 

) then 

10: return 
li: end if 

12: add 

n ne w to the end of list A^ 
13: if t ^ fcanon and t ^ Complement (f ca non) then 
14: return 
15: end if 

16: if 3n G C; canon such that Cost(n) > Cost(n new ) then 

17: Qcanon *~ 

18: end if 

19: if t = fcanon then 

20: C fcanon C tcmoo U{"new} 

21: else 

22: a cii „ o „^C r „U{N0t(n ne w)} 

23: end if 
24: return 



B. Cut enumeration and replacement 

We use a quite similar cut enumeration and replacement 
technique as in ifTHll . The main difference is that we use a 
Boolean matcher to calculate the canonical form of the NPN 
representative as well as the transformation to the canonical 
form from the original function, while in |fl8l , a faster table 
look-up is used. 

The Boolean matcher proposed in Q calculates only the 
canonical form representation. We modified the program so it 
can simultaneously generate the NPN transformation, which is 
needed when connecting the replacement graph to the whole 
circuit. 

Nodes are traversed in topological order. For each node 
starting from the Pis to the POs, all of its 5-input cuts are 
listed [4 |. The canonical form truth table and the correspond- 
ing NPN transformation of each cut are calculated using the 
Boolean matcher Q- Each cut is then evaluated whether 
there is a suitable replacement that does not increase the 
area of the network. Finally, the cut with the greatest gain 
is replaced by a best circuit. In the presented algorithm, zero- 
cost replacement is accepted, since it is a useful approach 
for re-arranging AIG structure to create more opportunities in 
subsequent rewriting ifPTl . 

The pseudo-code of the rewriting procedure is shown in 
Algorithm [3] For each node in the network, A^best denotes the 
largest number of nodes saved by replacing a cut of the node 
by a pre-computed candidate circuit; Cb es t an d M best denotes 
the corresponding candidate circuit and the original cut, re- 
spectively. These three variables are updated simultaneously, 
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if there exists a possible replacement. 

Procedure ConnectToLeaves(Af,c,K, Trans) connects the 
fanins of candidate circuit c to the leaves of cut u, following 
the NPN transformation Trans. 

Procedure Reference (N,c) increases the reference count 
of the nodes belong to sub-circuit c, in network N, whereas 
Dereference (N,c) decreases the reference count. When the 
reference count of a node becomes zero, the node does not 
belong to the network. 

Algorithm 3 RewriteNetwork(Af, C): Rewrite a Boolean 
network N using candidate circuits stored in hash map C. 



for each node n in N, in topological order do 

AW <- -1 
Cbest <- NULL 
"best <- NULL 

for each 5 -input cut u of n do 

t «- GetTruth(t<) 

(/canon, Tiaras) <— Canonicalize (?) 

for each candidate circuit c in C, canon do 
ConnectToLeaves (N,c,u, Trans) 
A^saved Dereference (N, u) 
A^added Reference^, c) 

^Ygain ^ ^saved Padded 

Dere f erence (N, c) 
Reference (N,u) 

if Again > and Ntest < N ga i B then 

AW "S— Again 
Cbest «- C 
"best U 

end if 
end for 
end for 

if AW = 1 then 

continue 
end if 

Dereference (N, u\,est) 
Reference (N, Cbest) 
end for 



In (fl~8), the authors proposed an optimization flow com- 
posed of balance, rewrite and refactor processes, and imple- 
mented it in the tool ABC [6 | with the script resynl. Compared 
to fiin . rewriting using 5-input cuts exploits larger cuts and 
more replacement options, thus has the potential for getting 
resyn2 script out of local minima, providing better rewriting 
opportunities. 

V. Experimental Results 

The presented algorithm is implemented using structurally 
hashed AIG as an internal circuit representation and integrated 
in ABC synthesis tool as a command rewrite5. To evaluate 
its effectiveness, we performed a set of experiments using 
IWLS 2005 benchmarks (20) with more than 5000 AIG nodes 
after structural hashing. All experiments were carried out 
on a laptop with Intel Core i7 1.6GHz (2.8GHz maximum 
frequency) quad-core processor, 6 MB cache, and 4 GB RAM. 



First, for each benchmark, we applied a sequence of com- 
mands resynl; rewrite5; resynl in the modified ABC and 
compared the result to two consecutive runs of resynl without 
rewrites in between. 

The results are summarized in Table J] Columns labeled by 
A give the area in terms of AIG nodes. Columns labeled by t 
give the runtime. The improvement of area and the increase of 
runtime are then calculated and shown in the last two columns. 

TableUshows that the average improvement in area achieved 
by adding rewrite5 in between two resynl runs is 3.50%, at 
the cost of 33.18% of extra runtime. This result indicates that 
the proposed rewrite5 method is effective in bringing ABC's 
resynl optimization script out of local minima, leading to 
better optimization possibilities. 

The second experiment is performed similarly, except we 
used a longer optimization flow: resynl; rewrite5; resynl; 
rewrite5; resynl. The result is compared to three consecutive 
runs of resynl script. 

The result of the second experiment is shown in Table HJ 
which has the same structure as Table U The average improve- 
ment in area using the new optimization flow is 4.88%, at 
the cost of 46.11% of extra runtime. This result shows the 
possibility to further extend the resynl sequence by inserting 
rewrite5 runs, to achieve even better optimization. 

Even longer optimization flows were also tested. The 
comparison of average results is summarized in Table [HI] 
The improvement in area converges after certain number of 
resynl-rewrite5 iterations. The increase of improvement is 
insignificant for more than four runs of resynl. 





improvement in area 


extra runtime 


ss -> sws 


3.50% 


33.18% 


sss -> swsws 


4.88% 


46.11% 


ssss -> swswsws 


5.39% 


47.48% 


sssss -> swswswsws 


5.57% 


51.21% 



NOTE: S stands for resynl; W stands for rewrite5. 

TABLE III 
Summary of average results. 



VI. Conclusion 

In this paper, we present an AIG-based rewriting technique 
that uses 5-input cuts. The technique extends the approach of 
AIG rewriting using 4-input cuts presented in ifTHl . Experi- 
mental results show that our algorithm is effective in driving 
other optimization techniques, such as resynl script in ABC, 
out of local minima. The proposed rewriting technique might 
be useful in a new optimization flow combining rewriting of 
both 4-input and 5-input cuts. 
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TABLE I 

Effectiveness of improving double resyn2 optimization flow using rewrites, on IWLS 2005 benchmarks. 
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1.97% 


41.38% 


vga_lcd 
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15.258 


88687 


22.223 


-0.07% 


45.65% 


wb_conmax 


47853 
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6.759 


38095 


9.032 


1.50% 
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Average 






4.88% 


46.11% 



TABLE II 

Effectiveness of improving triple resyril optimization flow using rewrites, on IWLS 2005 benchmarks. 
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